feat: improve cse bootstrap latency by deferring non-critical work#8105
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces Linux CSE bootstrap critical-path work by deferring non-essential steps until after ensureKubelet, and updates generated pkg/agent/testdata snapshots to reflect the new CSE/custom data output.
Changes:
- Defers
ensureNoDupOnPromiscuBridge,enableLocalDNS, and non-GPU driver cleanup until afterensureKubeletincse_main.sh. - Optimizes provisioning/runtime setup by switching kube binary activation to
mv+chmod, and reloading only a targeted sysctl file instead ofsysctl --system. - Updates VHD cleanup to disable
containerdand regeneratespkg/agent/testdataCustomData snapshots.
Reviewed changes
Copilot reviewed 18 out of 75 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
vhdbuilder/packer/cleanup-vhd.sh |
Disables containerd during VHD cleanup to avoid shipping images with it enabled. |
parts/linux/cloud-init/artifacts/cse_main.sh |
Defers some non-critical steps until after ensureKubelet; skips container runtime install for golden images/OSGuard. |
parts/linux/cloud-init/artifacts/cse_install.sh |
Changes kubelet/kubectl “activation” to mv + chmod to avoid redundant copy work. |
parts/linux/cloud-init/artifacts/cse_config.sh |
Uses targeted sysctl -p and starts kubelet before the TLS bootstrapping latency measurement service. |
pkg/agent/testdata/MarinerV2+Kata/CustomData |
Regenerated snapshot for updated CSE/custom data output. |
pkg/agent/testdata/CustomizedImage/CustomData |
Regenerated snapshot for updated CSE/custom data output. |
You can also share your feedback on Copilot code review. Take the survey.
| mv "/opt/bin/kubelet-${KUBERNETES_VERSION}" /opt/bin/kubelet | ||
| mv "/opt/bin/kubectl-${KUBERNETES_VERSION}" /opt/bin/kubectl | ||
|
|
||
| chmod a+x /opt/bin/kubelet /opt/bin/kubectl |
There was a problem hiding this comment.
this was what was before, keeping it as is
There was a problem hiding this comment.
why this change ? I'm not understanding, install was cleaner ? but slower ?
There was a problem hiding this comment.
also, why not force the access level ?
There was a problem hiding this comment.
also curious about both
There was a problem hiding this comment.
install does a copy and not a move.
Operation: It copies the file to the destination. A key difference from cp is that install unlinks (removes) the destination file first if it already exists, which can prevent issues (like an EBUSY error) when replacing a running executable.
There was a problem hiding this comment.
i kept the operation as is before chewi made the change to avoid regression, not sure if it was better or worse but just guarenteed to work and no regression.
There was a problem hiding this comment.
Regression? My change was merged two months ago. There are important reasons to use install over cp, including the one stated above. There are cases where the destination will be an existing symlink, and it is crucial that we replace the symlink, not its target. mv will do that, but I can't remember if there was some other reason why I didn't stick with mv.
There was a problem hiding this comment.
@chewi would you be ok to move it back to mv because its definitely seems faster in this case.
7e6264c to
192e020
Compare
192e020 to
c95a094
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce Linux CSE bootstrap critical-path latency by deferring non-critical steps until after ensureKubelet, avoiding redundant work (targeted sysctl reload, moving kube binaries), and adjusting VHD build/runtime behaviors around containerd.
Changes:
- Reorders CSE provisioning steps so kubelet starts earlier; starts
kubeletbeforemeasure-tls-bootstrapping-latency.service. - Optimizes provisioning work (targeted
sysctl -p,mv+chmodfor kube binaries, skip runtime install when golden image already contains it). - Adjusts VHD build scripts/tests to ensure containerd is started when needed and disabled during image cleanup; regenerates
pkg/agent/testdata.
Reviewed changes
Copilot reviewed 18 out of 77 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| vhdbuilder/packer/trivy-scan.sh | Sources provision helpers and ensures containerd is started before Trivy operations. |
| vhdbuilder/packer/test/linux-vhd-content-test.sh | Starts containerd before executing VHD validation tests. |
| vhdbuilder/packer/cleanup-vhd.sh | Disables containerd during VHD cleanup. |
| parts/linux/cloud-init/artifacts/cse_main.sh | Defers non-critical steps until after ensureKubelet; skips container runtime install on golden images. |
| parts/linux/cloud-init/artifacts/cse_install.sh | Uses mv + chmod when activating downloaded kubelet/kubectl. |
| parts/linux/cloud-init/artifacts/cse_config.sh | Uses targeted sysctl -p and starts kubelet before the TLS bootstrapping latency measurement service. |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Regenerated snapshot output for MarinerV2+Kata CustomData. |
| pkg/agent/testdata/CustomizedImage/CustomData | Regenerated snapshot output for CustomizedImage CustomData. |
You can also share your feedback on Copilot code review. Take the survey.
293a28c to
58051a7
Compare
58051a7 to
f9809eb
Compare
f9809eb to
e3b7f18
Compare
e3b7f18 to
7fc7c13
Compare
7fc7c13 to
fff45dc
Compare
| @@ -536,21 +537,37 @@ systemctlEnableAndStartNoBlock() { | |||
| systemctl status $service --no-pager -l > /var/log/azure/$service-status.log || true | |||
| return 1 | |||
| fi | |||
| } | |||
There was a problem hiding this comment.
systemctlEnableAndStartNoBlock uses systemctl restart --no-block, which only enqueues the job and can return success even if the unit immediately transitions to failed afterward. ensureKubelet relies on this returning non-zero to detect startup failures, so this can mask real failures (kubelet/measure-tls/containerd). Consider adding a short post-start health check (e.g., systemctl is-failed/ActiveState after a delay) or reintroducing an optional delay parameter so callers can fail fast when the unit enters failed.
fff45dc to
b938c10
Compare
b938c10 to
e9962a7
Compare
Devinwong
left a comment
There was a problem hiding this comment.
Thanks for walking through the changes. LGTM
| ExecStartPre=-/sbin/iptables -t nat --numeric --list | ||
|
|
||
| ExecStartPre=/bin/bash /opt/azure/containers/validate-kubelet-credentials.sh | ||
| ExecStartPre=/bin/sh -c 'until [ -S /run/containerd/containerd.sock ]; do sleep 0.1; done' |
There was a problem hiding this comment.
is this to avoid kubelet going into a bad exponential back-off or something?
| #!/bin/bash -eux | ||
|
|
||
| systemctl daemon-reload | ||
| systemctl disable --now containerd |
There was a problem hiding this comment.
nit: would prefer we do this in post-install-dependencies.sh - this script is also ran by the image builder service after optimization is complete (it shouldn't anything if this is also ran by the image builder service, though I think it's cleaner to just have this be scoped to our build-specific logic)
| exit $VALIDATION_ERR | ||
| fi | ||
|
|
||
| checkServiceHealth containerd || exit $ERR_SYSTEMCTL_START_FAIL |
There was a problem hiding this comment.
nit: it would be better if we had specific exit codes here, that way we can always pinpoint exactly which call to checkServiceHealth is failing
Summary
What changed
cse_main.sh, skip container runtime installation on Azure Linux OS Guard unless an explicit containerd override is provided, pre-warm kubelet, move containerd ulimit configuration earlier, and deferensureNoDupOnPromiscuBridgeplus non-GPU cleanup until afterensureKubeletcse_config.sh, switch containerd startup to the non-blocking helper, wait for containerd before artifact streaming / pause image / GPU-driver work, loadnf_conntrack, and record TLS bootstrap start time before kubelet so the latency service measures the full kubelet bootstrap windowcse_helpers.shandkubelet.service, addcheckServiceHealth,waitForContainerdReady, and anExecStartPrewait on the containerd socket so later work runs only after services are actually readymeasure-tls-bootstrapping-latency.sh, emit the completed event using the recorded kubelet start timestamp even when kubeconfig already exists or is created during the race window beforeinotifywaitstarts listeningcse_install.shand packer scripts, move downloaded kubelet/kubectl binaries into place with normalized ownership/permissions, restart containerd where packer validation needs it, and disable containerd during VHD cleanup so images do not carry the service enabled unexpectedlyTimings
BeforeAfter